Evaluate Business Location Using Pedestrian Traffic: Day and Night
Authored by: Barkha Javed, Weiren Kong
Duration: 75 mins
Level: Intermediate Pre-requisite Skills:Python
Scenario
As a business owner, I want to know how much pedestrian foot traffic occurs around me during the day and night, so that I can evaluate the suitability of the location and hours for my business.
Busy foot traffic in a business area may not always mean busy foot traffic at night.
As a business owner, I want to know how much pedestrian foot traffic I get compared to surrounding areas, so that I can assess if it is better to adapt my hours, extend or move locations.
Foot traffic flow during day or night may indicate adapting staff levels during specific hours.
Significantly low foot traffic in comparison to other streets may mean adapting business strategy or moving location.
Duration of steady foot traffic from early morning to mid afternoon only, may indicate adapting business hours to match.
What this Use Case will teach you
At the end of this use case you will understand how to:
A brief introduction to the datasets used
The datasets we examine for day and night traffic include the datsets below
Pedestrian Counting System - Monthly (counts per hour)
Pedestrian Counting System - Sensor Locations
The monthly hourly counts dataset contains traffic collected from sensors since 2009, it is updated monthly. It contains 10 fields including sensor id, date and time, year, month, day of year, day of week, time of day, sensor name and hourly counts.
The sensor locations dataset contains the details about the sensors detecting the pedestrian traffic. It contains 11 fields, the main ones of interest are sensor description, latitude, longitude, location and direction of reading.
Accessing and Loading data
#Libraries to be installed
##!pip -q is to give less output
!pip -q install sodapy
!pip -q install seaborn
!pip -q install pandas
!pip -q install matplotlib
!pip -q install numpy
!pip -q install nbconvert
!pip -q install keyboard
!pip -q install geopandas
!pip -q install sklearn
!pip -q install folium
#load libraries
import os
import time
import keyboard
from datetime import datetime
import numpy as np
import pandas as pd
from sodapy import Socrata
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import MinMaxScaler
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.cm as cm
import matplotlib.colors as colors
from matplotlib import style
style.use('ggplot')
import plotly.graph_objs as go
import plotly.express as px
import warnings
warnings.filterwarnings('ignore')
import folium
from folium.plugins import MarkerCluster
#Socrata client connection
client = Socrata('data.melbourne.vic.gov.au', '9UAGAvkem9fqXDhBLElapjazL', None)
#Pedestrian sensor location data
sensor_data_id = "h57g-5234"
results = client.get(sensor_data_id)
sensor_location = pd.DataFrame.from_records(results)
#sensor_location.head(5)
sensor_location[['latitude', 'longitude']] = sensor_location[['latitude', 'longitude']].astype(float)
sensor_location = sensor_location.drop('location',axis=1)
sensor_location['lat'] = sensor_location['latitude'].apply(lambda x: float(x))
sensor_location['lon'] = sensor_location['longitude'].apply(lambda x: float(x))
#View sensor data
sensor_location.info()
sensor_location.head(5)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 91 entries, 0 to 90 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 sensor_id 91 non-null object 1 sensor_description 91 non-null object 2 sensor_name 91 non-null object 3 installation_date 91 non-null object 4 status 91 non-null object 5 note 10 non-null object 6 latitude 91 non-null float64 7 longitude 91 non-null float64 8 direction_1 78 non-null object 9 direction_2 78 non-null object 10 lat 91 non-null float64 11 lon 91 non-null float64 dtypes: float64(4), object(8) memory usage: 8.7+ KB
| sensor_id | sensor_description | sensor_name | installation_date | status | note | latitude | longitude | direction_1 | direction_2 | lat | lon | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 16 | Australia on Collins | Col270_T | 2009-03-30T00:00:00.000 | R | Device moved to location ID 53 (22/09/2015) | -37.815734 | 144.965210 | NaN | NaN | -37.815734 | 144.965210 |
| 1 | 50 | Faraday St-Lygon St (West) | Lyg309_T | 2017-11-30T00:00:00.000 | A | NaN | -37.798082 | 144.967210 | South | North | -37.798082 | 144.967210 |
| 2 | 73 | Bourke St - Spencer St (South) | Bou655_T | 2020-10-02T00:00:00.000 | I | NaN | -37.816957 | 144.954154 | East | West | -37.816957 | 144.954154 |
| 3 | 66 | State Library - New | QVN_T | 2020-04-06T00:00:00.000 | A | NaN | -37.810578 | 144.964443 | South | North | -37.810578 | 144.964443 |
| 4 | 59 | Building 80 RMIT | RMIT_T | 2019-02-13T00:00:00.000 | A | NaN | -37.808256 | 144.963049 | South | North | -37.808256 | 144.963049 |
#Pedestrian foot count data
#Uncomment the below to open data source, download sensor data, and store it as a csv locally.
#sensor_data_id = "b2ak-trbp"
#results = client.get(sensor_data_id, limit=7000000)
#ds_traffic = pd.DataFrame.from_records(results)
#ds_traffic.to_csv('sensor_traffic.csv', index=False)
sensor_traffic = pd.read_csv('sensor_traffic.csv')
sensor_traffic.info()
print(f'\nThe shape of dataset is {sensor_traffic.shape}. \n')
print('Below are the first few rows of this dataset: ')
sensor_traffic.head(10)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4463670 entries, 0 to 4463669 Data columns (total 10 columns): # Column Dtype --- ------ ----- 0 id int64 1 date_time object 2 year int64 3 month object 4 mdate int64 5 day object 6 time int64 7 sensor_id int64 8 sensor_name object 9 hourly_counts int64 dtypes: int64(6), object(4) memory usage: 340.6+ MB The shape of dataset is (4463670, 10). Below are the first few rows of this dataset:
| id | date_time | year | month | mdate | day | time | sensor_id | sensor_name | hourly_counts | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2887628 | 2019-11-01T17:00:00.000 | 2019 | November | 1 | Friday | 17 | 34 | Flinders St-Spark La | 300 |
| 1 | 2887629 | 2019-11-01T17:00:00.000 | 2019 | November | 1 | Friday | 17 | 39 | Alfred Place | 604 |
| 2 | 2887630 | 2019-11-01T17:00:00.000 | 2019 | November | 1 | Friday | 17 | 37 | Lygon St (East) | 216 |
| 3 | 2887631 | 2019-11-01T17:00:00.000 | 2019 | November | 1 | Friday | 17 | 40 | Lonsdale St-Spring St (West) | 627 |
| 4 | 2887632 | 2019-11-01T17:00:00.000 | 2019 | November | 1 | Friday | 17 | 36 | Queen St (West) | 774 |
| 5 | 2887633 | 2019-11-01T17:00:00.000 | 2019 | November | 1 | Friday | 17 | 29 | St Kilda Rd-Alexandra Gardens | 644 |
| 6 | 2887634 | 2019-11-01T17:00:00.000 | 2019 | November | 1 | Friday | 17 | 42 | Grattan St-Swanston St (West) | 453 |
| 7 | 2887635 | 2019-11-01T17:00:00.000 | 2019 | November | 1 | Friday | 17 | 43 | Monash Rd-Swanston St (West) | 387 |
| 8 | 2887636 | 2019-11-01T17:00:00.000 | 2019 | November | 1 | Friday | 17 | 44 | Tin Alley-Swanston St (West) | 27 |
| 9 | 2887637 | 2019-11-01T17:00:00.000 | 2019 | November | 1 | Friday | 17 | 35 | Southbank | 2691 |
#Add date column
sensor_traffic['date'] = pd.to_datetime(sensor_traffic.date_time).dt.date
#Add day of week column
sensor_traffic['dow'] = pd.to_datetime(sensor_traffic.date_time).dt.day_of_week
#convert sensor_id to integer
sensor_traffic['sensor_id']=sensor_traffic['sensor_id'].astype(int)
sensor_location['sensor_id']=sensor_location['sensor_id'].astype(int)
# Mesh pedestrian sensor location and foot traffic datasets
sensor_ds = pd.merge(sensor_traffic, sensor_location, on='sensor_id')
#Simply using the year to differentiate all the years prior to 2020 as before Covid, and post 2019 Covid traffic to now
sensor_ds['pre2020_hourly_counts'] = np.where(sensor_ds['year']<2020,sensor_ds['hourly_counts'] , 0)
sensor_ds['post2019_hourly_counts'] = np.where(sensor_ds['year']>2019,sensor_ds['hourly_counts'] , 0)
#Add column for day or night traffic
#Add column for day (5am to 5pm) or night (6pm to 4am) traffic
sensor_ds['day_counts'] = np.where(((sensor_ds['time']>4) & (sensor_ds['time']<18)),sensor_ds['hourly_counts'] , 0)
sensor_ds['night_counts'] = np.where(sensor_ds['day_counts']==0,sensor_ds['hourly_counts'], 0)
sensor_ds.describe()
| id | year | mdate | time | sensor_id | hourly_counts | dow | latitude | longitude | lat | lon | pre2020_hourly_counts | post2019_hourly_counts | day_counts | night_counts | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 4.463670e+06 | 4.463670e+06 | 4.463670e+06 | 4.463670e+06 | 4.463670e+06 | 4.463670e+06 | 4.463670e+06 | 4.463670e+06 | 4.463670e+06 | 4.463670e+06 | 4.463670e+06 | 4.463670e+06 | 4.463670e+06 | 4.463670e+06 | 4.463670e+06 |
| mean | 2.232320e+06 | 2.017230e+03 | 1.574741e+01 | 1.146917e+01 | 2.700226e+01 | 4.951352e+02 | 2.999995e+00 | -3.781345e+01 | 1.449621e+02 | -3.781345e+01 | 1.449621e+02 | 4.074119e+02 | 8.772332e+01 | 3.594659e+02 | 1.356693e+02 |
| std | 1.289316e+06 | 3.551188e+00 | 8.800020e+00 | 6.938594e+00 | 1.951539e+01 | 7.509822e+02 | 2.000015e+00 | 6.187582e-03 | 8.531568e-03 | 6.187582e-03 | 8.531568e-03 | 7.443612e+02 | 2.852713e+02 | 7.131327e+02 | 3.910920e+02 |
| min | 1.000000e+00 | 2.009000e+03 | 1.000000e+00 | 0.000000e+00 | 1.000000e+00 | 0.000000e+00 | 0.000000e+00 | -3.782402e+01 | 1.449297e+02 | -3.782402e+01 | 1.449297e+02 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 |
| 25% | 1.115918e+06 | 2.015000e+03 | 8.000000e+00 | 5.000000e+00 | 1.100000e+01 | 4.300000e+01 | 1.000000e+00 | -3.781874e+01 | 1.449587e+02 | -3.781874e+01 | 1.449587e+02 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 |
| 50% | 2.231836e+06 | 2.018000e+03 | 1.600000e+01 | 1.100000e+01 | 2.300000e+01 | 1.780000e+02 | 3.000000e+00 | -3.781381e+01 | 1.449651e+02 | -3.781381e+01 | 1.449651e+02 | 6.200000e+01 | 0.000000e+00 | 2.400000e+01 | 0.000000e+00 |
| 75% | 3.347753e+06 | 2.020000e+03 | 2.300000e+01 | 1.700000e+01 | 4.100000e+01 | 6.050000e+02 | 5.000000e+00 | -3.781102e+01 | 1.449669e+02 | -3.781102e+01 | 1.449669e+02 | 4.530000e+02 | 2.400000e+01 | 3.520000e+02 | 5.900000e+01 |
| max | 4.469141e+06 | 2.022000e+03 | 3.100000e+01 | 2.300000e+01 | 8.700000e+01 | 1.597900e+04 | 6.000000e+00 | -3.779432e+01 | 1.449747e+02 | -3.779432e+01 | 1.449747e+02 | 1.597900e+04 | 1.443700e+04 | 1.161200e+04 | 1.597900e+04 |
flag_value=0
df_day=sensor_ds.query("day_counts > @flag_value")
print('Day info\n', df_day.info(),'\n')
df_night=sensor_ds.query("day_counts == @flag_value")
print('Night info \n',df_night.info(),'\n')
<class 'pandas.core.frame.DataFrame'> Int64Index: 2402520 entries, 0 to 4463663 Data columns (total 27 columns): # Column Dtype --- ------ ----- 0 id int64 1 date_time object 2 year int64 3 month object 4 mdate int64 5 day object 6 time int64 7 sensor_id int32 8 sensor_name_x object 9 hourly_counts int64 10 date object 11 dow int64 12 sensor_description object 13 sensor_name_y object 14 installation_date object 15 status object 16 note object 17 latitude float64 18 longitude float64 19 direction_1 object 20 direction_2 object 21 lat float64 22 lon float64 23 pre2020_hourly_counts int64 24 post2019_hourly_counts int64 25 day_counts int64 26 night_counts int64 dtypes: float64(4), int32(1), int64(10), object(12) memory usage: 504.1+ MB Day info None <class 'pandas.core.frame.DataFrame'> Int64Index: 2061150 entries, 1 to 4463669 Data columns (total 27 columns): # Column Dtype --- ------ ----- 0 id int64 1 date_time object 2 year int64 3 month object 4 mdate int64 5 day object 6 time int64 7 sensor_id int32 8 sensor_name_x object 9 hourly_counts int64 10 date object 11 dow int64 12 sensor_description object 13 sensor_name_y object 14 installation_date object 15 status object 16 note object 17 latitude float64 18 longitude float64 19 direction_1 object 20 direction_2 object 21 lat float64 22 lon float64 23 pre2020_hourly_counts int64 24 post2019_hourly_counts int64 25 day_counts int64 26 night_counts int64 dtypes: float64(4), int32(1), int64(10), object(12) memory usage: 432.4+ MB Night info None
Separate day and night, and before and after Covid for mapping
#split dataset to map difference between time before covid-19 and time after covid-19
df = df_day
print(df_day.head(2))
df_daypft_beforecovid = df.loc[df['year']<2020]
df_daypft_aftercovid = df.loc[df['year']>2019]
#get average hourly count for each sensor during the selected period of time
df_daypft_beforecovid_avg = df_daypft_beforecovid[['sensor_id','sensor_description','lat','lon','hourly_counts']]
df_daypft_beforecovid_avg = df_daypft_beforecovid_avg.groupby(['sensor_id','sensor_description','lat','lon',],as_index=False).agg({'hourly_counts': 'mean'})
df_daypft_aftercovid_avg = df_daypft_aftercovid[['sensor_id','sensor_description','lat','lon','hourly_counts']]
df_daypft_aftercovid_avg = df_daypft_aftercovid_avg.groupby(['sensor_id','sensor_description','lat','lon',],as_index=False).agg({'hourly_counts': 'mean'})
id date_time year month mdate day time \
0 2887628 2019-11-01T17:00:00.000 2019 November 1 Friday 17
12 2888288 2019-11-02T05:00:00.000 2019 November 2 Saturday 5
sensor_id sensor_name_x hourly_counts ... latitude longitude \
0 34 Flinders St-Spark La 300 ... -37.81538 144.974151
12 34 Flinders St-Spark La 7 ... -37.81538 144.974151
direction_1 direction_2 lat lon pre2020_hourly_counts \
0 East West -37.81538 144.974151 300
12 East West -37.81538 144.974151 7
post2019_hourly_counts day_counts night_counts
0 0 300 0
12 0 7 0
[2 rows x 27 columns]
#split dataset to see difference between time before covid-19 and time after covid-19
df = df_night
print(df.head(2))
df_night_beforecovid = df.loc[df['year']<2020]
df_night_aftercovid = df.loc[df['year']>2019]
#get average hourly count for each sensor during the selected period of time
df_night_beforecovid_avg = df_night_beforecovid[['sensor_id','sensor_description','lat','lon','pre2020_hourly_counts']]
df_night_beforecovid_avg = df_night_beforecovid_avg.groupby(['sensor_id','sensor_description','lat','lon',],as_index=False).agg({'pre2020_hourly_counts': 'mean'})
df_night_aftercovid_avg = df_night_aftercovid[['sensor_id','sensor_description','lat','lon','post2019_hourly_counts']]
df_night_aftercovid_avg = df_night_aftercovid_avg.groupby(['sensor_id','sensor_description','lat','lon',],as_index=False).agg({'post2019_hourly_counts': 'mean'})
id date_time year month mdate day time \ 1 2887683 2019-11-01T18:00:00.000 2019 November 1 Friday 18 2 2887738 2019-11-01T19:00:00.000 2019 November 1 Friday 19 sensor_id sensor_name_x hourly_counts ... latitude longitude \ 1 34 Flinders St-Spark La 240 ... -37.81538 144.974151 2 34 Flinders St-Spark La 158 ... -37.81538 144.974151 direction_1 direction_2 lat lon pre2020_hourly_counts \ 1 East West -37.81538 144.974151 240 2 East West -37.81538 144.974151 158 post2019_hourly_counts day_counts night_counts 1 0 0 240 2 0 0 158 [2 rows x 27 columns]
Examine Pedestrian Traffic
Pedestrian traffic has decreased after 2019, the aim is to try to understand patterns of day and night traffic.
#examine pre Covid and post 2019 foot traffic
ds = pd.DataFrame(sensor_ds.groupby(["time"])["pre2020_hourly_counts","post2019_hourly_counts"].mean())
df = ds.sort_values(by=['time'])
axs = df.plot.line(figsize=(20, 6), color=["teal","orange"])
axs.set_title('Foot Traffic by Time', size=20)
axs.set_ylabel('Average counts', size=14)
plt.show()
#distribution by traffic, by day
pivot = pd.DataFrame(pd.pivot_table(sensor_ds, values=['day_counts','night_counts'], index=['year'], aggfunc=np.mean))
rs = pivot.sort_values(by='year', ascending = False)
axs = rs.plot.line(figsize=(12, 5), color=['orange','navy'], legend=True);
axs.set_title('Foot Traffic by Year by day and night', size=20)
axs.set_ylabel('Average counts', size=14)
plt.show()
#examine pre Covid and post 2019 foot traffic
ds = pd.DataFrame(sensor_ds.groupby(["dow","day"])["pre2020_hourly_counts","post2019_hourly_counts"].mean())
df = ds.sort_values(by=['dow'])
axs = df.plot.bar(figsize=(12, 4), color=["teal","orange"])
axs.set_title('Foot Traffic by Day of Week - All Hours', size=20)
axs.set_ylabel('Average hourly counts', size=14)
plt.show()
The top 20 locations by foot traffic overall are displayed below.
#distribution by traffic, by day
pivot = pd.pivot_table(sensor_ds, values='day_counts', index=['sensor_id','sensor_description'], aggfunc=np.mean)
pivot_ds = pivot['day_counts'].nlargest(n=20)
pivot_ds.plot.bar(figsize=(12, 5), color='orange', legend=True);
#by night
pivot = pd.pivot_table(sensor_ds, values='night_counts', index=['sensor_id','sensor_description'], aggfunc=np.mean)
pivot_ds = pivot['night_counts'].nlargest(n=20)
axs = pivot_ds.plot.bar(figsize=(12, 5), color='navy', legend=True);
axs.set_title('Top 20: Foot Traffic by location by day and night', size=20)
axs.set_ylabel('Average counts', size=14)
plt.show()
We will investigate if the traffic volumes by location are different depending on day and night traffic.
Day Economy
#distribution by traffic, by day
pivot = pd.pivot_table(df_day, values='pre2020_hourly_counts', index=['sensor_id','sensor_description'], aggfunc=np.mean)
pivot_ds = pivot['pre2020_hourly_counts'].nlargest(n=20)
pivot_ds.plot.line(figsize=(12, 5), color='teal', legend=True, rot=90);
#by night
pivot = pd.pivot_table(df_day, values='post2019_hourly_counts', index=['sensor_id','sensor_description'], aggfunc=np.mean)
pivot_ds = pivot['post2019_hourly_counts'].nlargest(n=20)
axs = pivot_ds.plot.bar(figsize=(12, 5), color='orange', legend=True, rot=90);
axs.set_title('Top 20: Foot Traffic by location by day', size=20)
axs.set_ylabel('Average counts', size=14)
plt.show()
The chart above shows how signicantly traffic has changed after Covid from 2020 onwards for the top 20 locations.
We can see changes in the traffic of each sensor before Covid and after Covid on the map below. Please right click on the icon in the top right corner of the map and select the layer to see traffic before Covid.
#Visualise day data
m = folium.Map(location=[-37.8167, 144.967], zoom_start=15) # tiles='Stamen Toner'
locations = []
for i in range(len(df_daypft_beforecovid_avg)):
row =df_daypft_beforecovid_avg.iloc[i]
location = [(row.lat,row.lon)]*int(row['hourly_counts'])
locations += location
marker_cluster = MarkerCluster(
name='day traffic before covid-19',
locations=locations,
overlay=True,
control=True,
color='red',
show=False # this removes from automatic selection in display - need to select to show data points
)
marker_cluster.add_to(m)
#next layer
locations = []
for i in range(len(df_daypft_aftercovid_avg)):
row =df_daypft_aftercovid_avg.iloc[i]
location = [(row.lat,row.lon)]*int(row['hourly_counts'])
locations += location
marker_cluster = MarkerCluster(
name='day traffic after covid-19',
locations=locations,
overlay=True,
control=True,
)
marker_cluster.add_to(m)
folium.LayerControl().add_to(m)
m